Goto

Collaborating Authors

 Santa Catarina



Quantum Fourier Transform Based Kernel for Solar Irrandiance Forecasting

Mechiche-Alami, Nawfel, Rodriguez, Eduardo, Cardemil, Jose M., Droguett, Enrique Lopez

arXiv.org Machine Learning

This study proposes a Quantum Fourier Transform (QFT)-enhanced quantum kernel for short-term time-series forecasting. Exogenous predictors are incorporated by convexly fusing feature-specific kernels. For both quantum and classical models, the only tuned quantities are the feature-mixing weights and the KRR ridge α; classical hyperparameters (γ, r, d) are fixed, with the same validation set size for all models. Experiments are conducted on a noiseless simulator (5 qubits; window length L=32). Limitations and ablations are discussed, and paths toward NISQ execution are outlined. Introduction Quantum Machine Learning (QML) is an emerging discipline that combines the principles of quantum physics with traditional machine learning (ML) to exploit the distinctive characteristics of quantum systems, including superposition and entanglement phenomena [1]. This distinction facilitates the expeditious execution of certain tasks [2], such as classification and dimensionality reduction, where QML has demonstrated significant acceleration [3]. QML applications have extended to time-series data, leveraging quantum phenomena to model complex temporal dependencies. The goal is to enhance the results of traditional tasks by performing computations on qubits, which can process data more efficiently than classical bits [4, 5]. For example, Thakkar et al. [6] demonstrated that quantum machine-learning methods could enhance financial forecasting by improving both churn prediction and credit-risk assessment. Likewise, Kea et al. [7] developed a hybrid quantum-classical Long Short-Term Memory (QLSTM) to improve stock-price forecasting by leveraging quantum data encoding and high-dimensional quantum representations.


Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages

Omnilingual ASR team, null, Keren, Gil, Kozhevnikov, Artyom, Meng, Yen, Ropers, Christophe, Setzler, Matthew, Wang, Skyler, Adebara, Ife, Auli, Michael, Balioglu, Can, Chan, Kevin, Cheng, Chierh, Chuang, Joe, Droof, Caley, Duppenthaler, Mark, Duquenne, Paul-Ambroise, Erben, Alexander, Gao, Cynthia, Gonzalez, Gabriel Mejia, Lyu, Kehan, Miglani, Sagar, Pratap, Vineel, Sadagopan, Kaushik Ram, Saleem, Safiyyah, Turkatenko, Arina, Ventayol-Boada, Albert, Yong, Zheng-Xin, Chung, Yu-An, Maillard, Jean, Moritz, Rashel, Mourachko, Alexandre, Williamson, Mary, Yates, Shireen

arXiv.org Artificial Intelligence

Automatic speech recognition (ASR) has advanced in high-resource languages, but most of the world's 7,000+ languages remain unsupported, leaving thousands of long-tail languages behind. Expanding ASR coverage has been costly and limited by architectures that restrict language support, making extension inaccessible to most--all while entangled with ethical concerns when pursued without community collaboration. To transcend these limitations, we introduce Omnilingual ASR, the first large-scale ASR system designed for extensibility. Omnilingual ASR enables communities to introduce unserved languages with only a handful of data samples. It scales self-supervised pre-training to 7B parameters to learn robust speech representations and introduces an encoder-decoder architecture designed for zero-shot generalization, leveraging a LLM-inspired decoder. This capability is grounded in a massive and diverse training corpus; by combining breadth of coverage with linguistic variety, the model learns representations robust enough to adapt to unseen languages. Incorporating public resources with community-sourced recordings gathered through compensated local partnerships, Omnilingual ASR expands coverage to over 1,600 languages, the largest such effort to date--including over 500 never before served by ASR. Automatic evaluations show substantial gains over prior systems, especially in low-resource conditions, and strong generalization. We release Omnilingual ASR as a family of models, from 300M variants for low-power devices to 7B for maximum accuracy. We reflect on the ethical considerations shaping this design and conclude by discussing its societal impact. In particular, we highlight how open-sourcing models and tools can lower barriers for researchers and communities, inviting new forms of participation. Open-source artifacts are available at https://github.com/facebookresearch/omnilingual-asr.


To unearth their past, Amazonian people turn to 'a language white men understand'

Science

The site, a few kilometers from her own hut in Ipatsé, a Kuikuro village in the Xingu Indigenous territory, was once the backyard of her great-grandparents' house. As she scrapes the brown earth with a trowel, she soon spots a black ceramic shard. It is only about the size of her palm, and this is her first day ever on an archaeological excavation. But she immediately recognizes what the object once was. "It's an alato," she says, showing the piece to a group of archaeologists and other Kuikuro who have gathered to watch the excavation in the village of Anitahagu. An alato, Yamána explains, is a large pan used to cook beiju, a white flatbread made with yucca flour that's eaten almost every day in her village. Her grandmother still has one in the backyard fire pit where she prepares most meals, just as countless Kuikuro women did before her. This alato likely belonged to her great-grandmother on her mother's side.

  Country:
  Genre: Research Report > New Finding (0.47)
  Industry:

Human-Level Reasoning: A Comparative Study of Large Language Models on Logical and Abstract Reasoning

Moreira, Benjamin Grando

arXiv.org Artificial Intelligence

Evaluating reasoning ability in Large Language Models (LLMs) is important for advancing artificial intelligence, as it transcends mere linguistic task performance. It involves understanding whether these models truly understand information, perform inferences, and are able to draw conclusions in a logical and valid way. This study compare logical and abstract reasoning skills of several LLMs - including GPT, Claude, DeepSeek, Gemini, Grok, Llama, Mistral, Perplexity, and Sabi a - using a set of eight custom-designed reasoning questions. The LLM results are benchmarked against human performance on the same tasks, revealing significant differences and indicating areas where LLMs struggle with deduction.


Comprehending Spatio-temporal Data via Cinematic Storytelling using Large Language Models

Shang, Panos Kalnis. Shuo, Jensen, Christian S.

arXiv.org Artificial Intelligence

Spatio-temporal data captures complex dynamics across both space and time, yet traditional visualizations are complex, require domain expertise and often fail to resonate with broader audiences. Here, we propose MapMuse, a storytelling-based framework for interpreting spatio-temporal datasets, transforming them into compelling, narrative-driven experiences. We utilize large language models and employ retrieval augmented generation (RAG) and agent-based techniques to generate comprehensive stories. Drawing on principles common in cinematic storytelling, we emphasize clarity, emotional connection, and audience-centric design. As a case study, we analyze a dataset of taxi trajectories. Two perspectives are presented: a captivating story based on a heat map that visualizes millions of taxi trip endpoints to uncover urban mobility patterns; and a detailed narrative following a single long taxi journey, enriched with city landmarks and temporal shifts. By portraying locations as characters and movement as plot, we argue that data storytelling drives insight, engagement, and action from spatio-temporal information. The case study illustrates how MapMuse can bridge the gap between data complexity and human understanding. The aim of this short paper is to provide a glimpse to the potential of the cinematic storytelling technique as an effective communication tool for spatio-temporal data, as well as to describe open problems and opportunities for future research.


Benchmarking noisy label detection methods

Pickler, Henrique, Kamassury, Jorge K. S., Silva, Danilo

arXiv.org Machine Learning

Label noise is a common problem in real-world datasets, affecting both model training and validation. Clean data are essential for achieving strong performance and ensuring reliable evaluation. While various techniques have been proposed to detect noisy labels, there is no clear consensus on optimal approaches. We perform a comprehensive benchmark of detection methods by decomposing them into three fundamental components: label agreement function, aggregation method, and information gathering approach (in-sample vs out-of-sample). This decomposition can be applied to many existing detection methods, and enables systematic comparison across diverse approaches. To fairly compare methods, we propose a unified benchmark task, detecting a fraction of training samples equal to the dataset's noise rate. We also introduce a novel metric: the false negative rate at this fixed operating point. We identify that in-sample information gathering using average probability aggregation combined with the logit margin as the label agreement function achieves the best results across most scenarios. Our findings provide practical guidance for designing new detection methods and selecting techniques for specific applications. Keywords: Noisy label detection, Noisy labels, Dataset cleaning, Data quality, Benchmark, Neural networks 1. Introduction Most supervised learning methods assume a perfectly labeled dataset. However, training data often contain incorrectly labeled instances. Even large, standard benchmark datasets, such as CIFAR, ImageNet, and MS-COCO, are known to have noisy labels [1, 2].


Man-Made Heuristics Are Dead. Long Live Code Generators!

Dwivedula, Rohit, Saxena, Divyanshu, Akella, Aditya, Chaudhuri, Swarat, Kim, Daehyeok

arXiv.org Artificial Intelligence

Policy design for various systems controllers has conventionally been a manual process, with domain experts carefully tailoring heuristics for the specific instance in which the policy will be deployed. In this paper, we re-imagine policy design via a novel automated search technique fueled by recent advances in generative models, specifically Large Language Model (LLM)-driven code generation. We outline the design and implementation of PolicySmith, a framework that applies LLMs to synthesize instance-optimal heuristics. We apply PolicySmith to two long-standing systems policies - web caching and congestion control, highlighting the opportunities unraveled by this LLM-driven heuristic search. For caching, PolicySmith discovers heuristics that outperform established baselines on standard open-source traces. For congestion control, we show that PolicySmith can generate safe policies that integrate directly into the Linux kernel.



CNN-TFT explained by SHAP with multi-head attention weights for time series forecasting

Stefenon, Stefano F., Matos-Carvalho, João P., Leithardt, Valderi R. Q., Yow, Kin-Choong

arXiv.org Artificial Intelligence

Convolutional neural networks (CNNs) and transformer architectures offer strengths for modeling temporal data: CNNs excel at capturing local patterns and translational invariances, while transformers effectively model long-range dependencies via self-attention. This paper proposes a hybrid architecture integrating convolutional feature extraction with a temporal fusion transformer (TFT) backbone to enhance multivariate time series forecasting. The CNN module first applies a hierarchy of one-dimensional convolutional layers to distill salient local patterns from raw input sequences, reducing noise and dimensionality. The resulting feature maps are then fed into the TFT, which applies multi-head attention to capture both short- and long-term dependencies and to weigh relevant covariates adaptively. We evaluate the CNN-TFT on a hydroelectric natural flow time series dataset. Experimental results demonstrate that CNN-TFT outperforms well-established deep learning models, with a mean absolute percentage error of up to 2.2%. The explainability of the model is obtained by a proposed Shapley additive explanations with multi-head attention weights (SHAP-MHAW). Our novel architecture, named CNN-TFT-SHAP-MHAW, is promising for applications requiring high-fidelity, multivariate time series forecasts, being available for future analysis at https://github.com/SFStefenon/CNN-TFT-SHAP-MHAW .